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Abstract 


We develop a Python tool to estimate the tail distribution of the number of dark matter halos beyond a mass 
threshold and in a given volume in a light-cone. The code is based on the extended Press-Schechter model and is 
computationally efficient, typically taking a few seconds on a personal laptop for a given set of cosmological 
parameters. The high efficiency of the code allows a quick estimation of the tension between cosmological models 
and the red candidate massive galaxies released by the James Webb Space Telescope, as well as scanning the 
theory space with the Markov Chain Monte Carlo method. As an example application, we use the tool to study the 
cosmological implication of the candidate galaxies presented in Labbé et al. The standard A cold dark matter 
(ACDM) model is well consistent with the data if the star formation efficiency can reach ~0.3 at high redshift. For 
a low star formation efficiency €^ 0.1, the ACDM model is disfavored at —26—3c confidence level. 
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1. Introduction 


In the standard A cold dark matter (ACDM) picture, baryons 
decouple from radiation after recombination and track the 
gravitational potential of dark matter. Before the formation of 
early galaxies, baryons and dark matter are well mixed. The 
fraction of baryonic mass in a massive dark matter halo 
therefore approximately equals the cosmological mean 
fo 5 9,/0,, where Q, and Q,, are the baryon and matter 
abundance parameters, respectively. If the fraction of stellar 
mass in baryonic matter, namely the star formation efficiency 
(SFE) € is known, we can connect the stellar mass M, to the 
halo mass Mya; via M, = efbMhaio. This provides a way to test 
cosmology by estimating stellar masses of massive galaxies in 
a given volume. In particular, at high redshift where massive 
halos are predicted to be very rare, detection of very massive 
galaxies may pose a stringent constraint on cosmology. The red 
candidate massive galaxies released by the James Webb Space 
Telescope (JWST), for instance, have inspired a debate as to 
whether a beyond-ACDM cosmology is necessary (Haslbauer 
et al. 2022; Wang et al. 2023c; Boylan-Kolchin 2023; Chen 
et al. 2023; Ferrara et al. 2023; Labbé et al. 2023; Lovell et al. 
2023; Qin et al. 2023). Plenty of beyond-ACDM models, 
including alternative dark energy (Menci et al. 2022; Wang 
et al. 2023b; Adil et al. 2023), non-standard dark matter (Bird 
et al. 2023; Dayal & Giri 2024; Lin et al. 2024; Maio & 
Viel 2023), cosmic strings (Jiao et al. 2023), primordial black 
holes (Gouttenoire et al. 2023; Guo et al. 2023; Hütsi et al. 
2023; Huang et al. 2023; Yuan et al. 2023) and many others 


(Lovyagin et al. 2022; John & Babu Joseph 2023; Lei et al. 
2024; Melia 2023), have been confronted with the JWST high- 
redshift candidate galaxies. Modification of primordial condi- 
tions may also lead to an overabundance of dark matter halos 
and hence more massive galaxies at high redshift (Huang 2019; 
Wang & Liu 2022; Forconi et al. 2023; Keller et al. 2023; 
Padmanabhan & Loeb 2023; Parashari & Laha 2023; Su et al. 
2023; Tkachev et al. 2024). 

The major obstacle to an accurate comparison between a 
theory and the corresponding observed galaxies is the 
uncertainty in the SFE e at high redshift. Galaxy formation 
models and low-redshift observation typically suggest e < 0.2 
(Guo et al. 2020; Boylan-Kolchin 2023), though e at high 
redshift may be significantly different (Zhang et al. 2022; Qin 
et al. 2023). Because the ultraviolet (UV) radiation from star 
formation can ionize the neutral hydrogen at the reionization 
epoch, a larger SFE tends to accelerate the reionization process 
and finish it at an earlier time. Observations of the Lyo emitter 
]uminosity function suggest that reionization is still ongoing at 
z2 7 (Goto et al. 2021; Wold et al. 2022), consistent with the 
€ $0.2 picture (Minoda et al. 2023; Mobina Hosseini et al. 
2023). It is however difficult to derive an upper-bound for e 
solely from the reionization history, due to the degeneracy 
between e and other astrophysical parameters. 

The common practice is then to study the cosmological 
implication for an assumed SFE. Although most of the studies 
qualitatively agree, quantitative discrepancies still exist. For 
example, while Boylan-Kolchin (2023) and Chen et al. (2023) 
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Figure 1. Workflow of observation-theory comparison. 


find that € ~ 0.5 suffices to explain the Labbé et al. (2023) data 
in the standard ACDM cosmology, Menci et al. (2022), who 
use the same data, claim that ACDM is ruled out at the 26 level 
even with e= 1. This is not only due to different choices of 
mass cut and volume cut, but also because it is hard to give an 
accurate description of the systematic errors of redshift and 
stellar mass obtained with spectral energy distribution (SED) 
fiting. By using seven individual redshift and stellar mass 
measurements with different SED-fitting codes/templates, 
Labbé et al. (2023) estimated extremes of stellar mass and 
redshift of each candidate galaxy. These extremes however are 
insufficient for quantitative analysis, which requires precise 
knowledge of the probability density functions (PDFs) of the 
systematic errors. 

The summary statistics of the statistical errors of stellar mass 
are often represented as M dr um , where M is the median mass. 
The upper and lower errors (AMgup and AM in) are defined with 
16th and 84th percentiles. The errors in stellar mass, and 
similarly in redshift if only photometry information is 
available, are typically asymmetric and difficult to model 
analytically. Accurate observation-theory comparison usually 
requires an end-to-end comparison between high-resolution 
simulations and observations. On the other hand, while the 
summary statistics (and the extremes from different SED-fitting 
methods, if available) contain limited information, they are 
important science products of many galaxy surveys. The aim of 
the present work is to develop a tool that performs a 
preliminary scan of the theory space based on these simple 
products, before planning subsequent spectroscopic observa- 
tion and running costly simulations. 


This paper is organized as follows. Section 2 describes the 
physical model of the abundance of high-redshift massive 
galaxies and the workflow of observation-theory comparison. 
Section 3 uses the tool to analyze the cosmological implication 
of the candidate galaxies in Labbé et al. (2023). Section 4 
concludes. 


2. Algorithm of the Tool 


Figure | summarizes our algorithm to do an observation- 
theory comparison. We consider galaxies beyond a stellar-mass 
threshold (M, > M, cur) and in a fixed comoving volume 
defined by the sky coverage fi and redshift range 
(Zmin < Z € Zmax). Because both the stellar mass and redshift 
from SED-fitting have significant uncertainties, whether an 
observed galaxy is above the stellar-mass threshold and in the 
volume is probabilistic. The total observed galaxies above the 
stellar-mass threshold and in the volume, denoted as Nobs, is 
therefore a random variable. Due to the cosmic variance and 
the Poisson shot noise, the theoretical prediction of the total 
number of galaxies above the stellar-mass threshold and in the 
volume, denoted as Nn, is also probabilistic. 

A model is ruled out when and only when Nn < Nobs. The 
tension between a cosmological model and the observed galaxies 
is therefore quantified by the probability P(N < Nos), or 
more conveniently, the smallness of P(Mn > Nj) = | — 
P(Nn < Ny). For instance, P(Ng > Ny) = 0.01 indicates 
that the model under consideration is ruled out at a 99% 
confidence level. Here and in what follows, P(-) stands for 
probability (for a discrete variable or an event) or PDF (for a 
continuous variable). 
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For a given cosmology, P(N, > 
steps. 


1. For Nobs = 0, 1, 2, ... compute the distribution function 
P(Nops) from summary statistics of observed galaxies and 
user-specified assumptions about the functional form of 
PDFs of stellar mass and redshift. 

2. For Na — 0, 1, 2, ... compute the distribution function 
P(Nn) from the cosmological model and a user- 
specified SFE. 

3. Compute 


Ny) is evaluated in three 


P(Na > Nov) = Y) POR) 32 PN. 


Nobs=9 Mn=Nobs 


Explicit evaluation of P (Np > Novs) requires perfect knowl- 
edge about the PDFs of statistical/systematic errors of the 
stellar masses and redshifts of the candidate galaxies. While the 
PDFs of statistical errors can be recovered by going back to the 
SED-fitting process, it is not easy to give an accurate 
description of the PDFs of systematic errors, which often 
dominate the error budget. Without further knowledge about 
the systematic errors, we simply model the PDFs with a few 
generic distribution functions: skew normal, skew triangular 
and skew rectangular, whose definitions are given in the 
Appendix. The code is organized in such a way that the users 
can easily add their own distribution functions. We also allow 
the users to turn off systematic errors by using the Dirac delta 
distribution. In the case when systematic errors are given in the 
form of extremes from different SED-fitting methods, we 
match the extremes with exact upper/lower bounds of skew 
triangular/rectangular distributions or 20 bounds of a skew 
normal distribution. Finally, the stellar mass M, and redshift z 
from different SED-fitting methods are typically positively 
correlated. We test the impact of the M,-z correlation by 
forcing (M, — M. median)(Z — Zmedian) 2 0 in the joint distribu- 
tion of M, and z, and find that the correlation does not make a 
qualitative difference. 

The evaluation of P(N») involves the extended Press— 
Schechter ellipsoidal collapse model (Press & Schechter 1974; 
Sheth et al. 2001; Sheth & Tormen 2002), where the comoving 
halo number density per mass interval, namely the halo mass 
function (at a fixed redshift z), is given by 


dn. Pm dinc 
dM -— fe m dM ` 


The pm on the right-hand side is the comoving average 
background matter density and ø is the mass fluctuation at scale 


1/3 
3M 
R= (4) , expressed as 


(1) 
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where k is the comoving wavenumber, and P,,,(k, z) is the linear 
matter power spectrum at redshift z. For the filter function W 
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(kR), we take the form of a top-hat 


3(sinx — x cos 
a (3) 
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W(x) = 


The simulation calibrated factor f(v) is written as 


1 av? iv av? 
melt) e 


with A=0.322, a=0.707, q—0.3 and v= Eo. 
parameter ó. = 1.686 corresponds to the critical linear over- 
density. We ignore the tiny difference in ô. for different 
cosmological models, as we are only interested in the models 
that are close to the concordance ACDM model. 

For a massive galaxy with stellar mass M, > M, cut, the mass 
of its host halo must exceed Mc = M, cur/(f,). The halo mass 
function Equation (1) allows computing the ensemble average 
of the mean number of such halos 


x dn dV 
(Ne) má f, Ar T argui O 


where 


where “i is comoving volume per redshift interval per solid 
angle. Here we have equalized the number of massive galaxies 
and the number of their host halos, assuming each massive halo 
has a central galaxy. 

For the purpose of using rare objects to study cosmological 
tensions, we are only interested in the (Nm) S O(1) case, where 


Nin approximately follows a Poisson distribution 


ANa 


P(Na) ~ e^ 
th! 


(6) 


where \ (Nin) is given in Equation (5). 
To obtain a more accurate result, we integrate on the right- 
hand side of Equation (6) over a Gaussian distribution, giving 


oo 1 | Q- (Na)? Mn 
Pn) = f =e We 
—0o0 T OX 


The beyond-Poisson cosmic variance from the large scale 
structure, a), can be read out from the online emulator at 
https:/ /www.ph.unimelb.edu.au/ - mtrenti /cvc/ Cosmic Variance. 
html (Trenti & Stiavelli 2008). For a pencil-like survey volume, 
we find the cosmic variance approximately equal to the product of 
the linear result, which equals the product of linear halo bias and 
matter cosmic variance, and a fudge factor 21.4 that accounts for 
the nonlinear correction. Since the cosmic-variance correction is a 
subdominant effect," this approximation suffices and allows us to 
make our tool self-contained. 


Nal dA. (7) 


^ Qur result does not contradict that of Chen et al. (2023), who claims that 
cosmic variance taken from their simulations is a dominant factor. Our “cosmic 
variance" here, as well as that defined in Trenti & Stiavelli (2008), refers to the 
variance due to power spectrum uncertainties caused by finite sampling and 
does not include Poisson shot noise, while “cosmic variance" taken from 
simulations includes both effects. 
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Figure 2. Dependence of P (Nn > Nos) on the SFE and the model of the PDF 
of systematic errors. 


The algorithm is realized with a Python tool publicly 
available at http: //zhiqihuang.top/codes/massivehalo.htm. 


3. Applying to JWST Data 


We now apply the tool to the 13 candidate galaxies in 
Labbé et al. (2023). The summary statistics of stellar mass and 
redshift are given in the Extended Data Table 2 of Labbé et al. 
(2023), as well as in the online repository given in Section 2. 
Stellar mass and redshift values include two uncertainties: 
(ran) + (sys), where the random uncertainty (ran) corre- 
sponds to the 16th and 84th percentiles of the combined 
posterior distribution, and the systematic uncertainty (sys) 
corresponds to the extreme values from seven different 
methods (EAZY, Prospector and Bagpipes with five varia- 
tions, including dust, SFH, age prior and SNR limit). In this 
study, we update the data of three candidate galaxies (L23- 
13050, L23-35300 and L23-39575) with the information from 
spectroscopic follow-up observations (Fujimoto et al. 2023; 
Kocevski et al. 2023; Pérez-González et al. 2023). The 
updated data are also available in the online repository given 
in Section 2. 

Unless otherwise specified, we choose the stellar-mass 
threshold to be M, cut = 10'°M., and the redshift range to be 
Zmin = 7 and Zmax = 10, and we assume the statistical errors 
follow a skew normal distribution. The data cover a sky area of 
38 arcmin?, which translates to foky = 2.56 x 1077. We take the 
cosmology to be the Planck best-fit ACDM model (Aghanim 
et al. 2020) with Q, = 0.3158, Q, = 0.049389, scalar spectral 
index n,— 0.96605, matter fluctuation parameter og = 0.812 
and Hubble constant Ho = 67.32kms ! Mpc |. Figure 2 
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shows the dependence of P(N 2 Novs) on the SFE and the 
model of the PDFs of systematic errors. For e = 0.3, which is 
fairly possible at the early epoch of galaxy formation (Zhang 
et al. 2022; Qin et al. 2023), the data are well consistent with 
the ACDM model even when the systematic errors are ignored 
(dotted blue line in Figure 2). For the most conservative e ~ 0.1 
case, the ACDM model is slightly disfavored by the data. For 
different forms of the PDFs of the systematic errors, the 
statistical significance of the tension varies between ~2c0 
and —3oc. 

It has been shown that the dynamic dark energy model, with 
equation of state w(a)— wo-d-w,(1— a) (Chevallier & 
Polarski 2001; Linder 2003), where a is the scale factor, may 
alleviate the tension between cosmology and the Labbé et al. 
(2023) data. Since the tension is only significant when the SFE 
€ is S0.1, we hereafter fix € — 0.1 and investigate whether the 
dynamic dark energy model is well consistent with the data. 
For simplicity, we also assume that the systematic errors follow 
the skew triangular distribution. We take cosmological 
parameters from the Markov chains that are sampled with 
Planck+BAO+SNe likelihood. Here Planck refers to the 
TTTEEE+lowl+LowE likelihood of the cosmic microwave 
background maps measured by the Planck satellite (Aghanim 
et al. 2020); BAO refers to the baryon acoustic oscillations data 
that are summarized in Alam et al. (2017); SNe stands for the 
Pantheon supernova catalog (Scolnic et al. 2018). We down- 
load the chains from Planck Legacy Archive (https: / /pla.esac. 
esa.int/pla/) and, for better visual effect, thin the chains by a 
factor of 5. For each set of cosmological parameters of the 
thinned chains, we compute the P(N, 2 Nos) value and show 
it in Figure 3 as a dot, whose color represents the value of 
P(Nn 2 Novs) and coordinates indicate the dark energy 
parameters (wo, Wa). The variation of the dynamic dark energy 
equation of state in the range that is allowed by Planck+BAO 
+SNe does lead to a variation of P(Mn > Nos), however, not 
at a very significant level that would make the result 
qualitatively different. 


4. Discussion and Conclusions 


In this work, we present and make publicly available an 
efficient numeric tool that can quickly estimate the tension 
between cosmological models and observed high-redshift 
massive galaxies. Our method is based on the statistics of the 
total numbers of massive halos, whereas the previous works are 
all based on the statistics of the cumulative stellar mass 
p«(» M.) (Menci et al. 2022; Boylan-Kolchin 2023; Chen 
et al. 2023; Labbé et al. 2023; Wang et al. 2023b). The 
advantage of using the number of halos as a fundamental 
variable is that the dominating Poisson shot noise contribution 
can be explicitly formulated. 
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Figure 3. P(N > Nos) for wo-w, cosmology. The SFE is fixed to be 0.1. 
Systematic errors are assumed to follow a skew triangular distribution. For each 
point, the cosmological parameters are drawn from thinned Planck+BAO 
+SNe Markov chains. The solid black lines are the marginalized 68.3% and 
95.4% confidence contours for the Planck+BAO-+SNe likelihood. 


We update the parameters of some of the candidate galaxies 
in Labbé et al. (2023) with recent spectroscopic follow-up 
observations. For a low SFE e~ 0.1, the data remain in >20 
tension with the concordance ACDM cosmology, and the 
tension is insensitive to the dark energy equation of state and 
other cosmological parameters, provided that the cosmological 
parameters are confined by the Planck+BAO+SNe likelihood. 
We also verify that the updated spectroscopic redshift 
information slightly reduces the tension between the ACDM 
model and the JWST candidate galaxies. For instance, 
assuming a skew normal distribution of both systematic and 
statistical errors and SFE «=0.1, we find the Planck best-fit 
ACDM model is ruled out at the 2.596 level with the original 
data from Labbé et al. (2023) and 2.490 level with the 
updated data. 

Although the algorithm in Figure 1 captures the main physics, 
there are subtleties beyond the scope of the present work. First, it 
is known that the extended Press-Schechter halo mass functions 
at high redshift are 2096—5096 higher than those obtained from 
N-body simulations (Reed et al. 2003; Despali et al. 2016; 
Shirasaki et al. 2021; Wang et al. 2022). Second, backsplash 
halos, whose dark matter has left the host halo while its gas may 
remain in the host halo (Balogh et al. 2000; Wang et al. 2009; 
Wetzel et al. 2014; Diemer 2021; Wang et al. 2023a), may make 
the interpretation of SFE ambiguous in the real world and may 
alter the statistical significance of the cosmological tension 
(Chen et al. 2023). Finally, we have ignored catastrophic redshift 
errors which, despite being rare, might have a significant impact 
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on the counting of rare objects. We leave further studies on these 
nontrivial subjects as our future work. 
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Appendix 
Modeling the Distribution Functions of Stellar Mass 
and Redshift 


Here we introduce the distribution functions that are 
incorporated into the tool. 


A.1. Skew Normal Distribution 


The standardized skew normal distribution with parameter a 
and of a random variable x is given by the PDF 


N(x; a) = Xn + erf (ax/ 42)]. (Al) 


In the general case, the skew normal distribution is obtained by 
rescaling and translating x, 


N(x; a, C, w) = (=; a) (A2) 
w 


For a given set of summary statistics, the parameters a, C, w can 
be fixed by matching the 16th and 84th percentiles and the 
median. The skew normal distribution (A2) cannot describe a 
very skewed distribution with the ratio between two errors 
exceeding 1.55. In this extreme case, we replace the linear 
transformation on x in (A1) with a nonlinear one. Since this 
procedure is quite complicated and does not have much to 
do with the scientific result in the present work, we 
refer interested readers to the publicly available source code 
http:/ /zhigihuang.top/codes/fitskew.py. 


A.2. Skew Triangular and Skew Rectangular 


For parameters a» 0, b>0 and u, the skew triangular 
distribution of a random variable x is defined as 


B ifu—acxt&mq 
a 
; = b — ; 
Tis dud n m ifucxcpu-cb; ed 
0, else. 
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The skew rectangular distribution is defined as 


Ls ifp—-a<x<cp; 


2a 
R(x; = A4 
95 ids h) = if<x<pthb; (9 
0, else. 
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